A possible downstream analysis in R is an over-representation analysis (ORA) to determine whether genes from specific pathways (e.g. KEGG pathways) are present more than expected in our network.
For that we need the following inputs: a matrix to evaluate (mat) and a feature set (pathways). To create the feature set, the pathway list from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, the pathway list from the Gene Ontology Biological Processes (GOBP) database and the pathway list from ECM-related mechanisms (NABA) are loaded. Further, to set up the evaluation matrix, the PKN from COSMOS is loaded, PKN nodes that are not present in the pathways removed and metabolic interactions discarded. Then, by associating all nodes from the PKN/KEGG/GOBP/NABA pathways present in the inferred biological network with value 1 and all nodes not present with value 0, we can create the matrix.
library(cosmosR)
library(MOFA2)
library(readr)
library(ggplot2)
library(ggfortify)
library(dplyr)
library(reshape2)
library(liana)
library(decoupleR)
library(moon)
library(pheatmap)
library(gridExtra)
library(liana)
library(GSEABase)
library(tidyr)
library(RCy3)
combined_SIF_reduced <- read_csv(file = "results/cosmos/SIF_mofamoon_combined_reduced.csv")
## Rows: 1163 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Node1, Node2
## dbl (2): Sign, Weight
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
combined_ATT_reduced <- read_csv(file = "results/cosmos/ATT_mofamoon_combined_reduced.csv")
## Rows: 305 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Nodes
## dbl (9): NodeType, ZeroAct, UpAct, DownAct, AvgAct, measured, Activity, sign...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Feature set
pathways_df <- data.frame(rbind(import_gmt("support/c2.cp.v7.2.symbols.gmt"), import_gmt("support/c5.go.bp.v2022.1.Hs.symbols.gmt")))
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================= | 71%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
pathways_NABA_KEGG_GOBP <- data.frame(pathways_df[grepl("NABA_",pathways_df$term) | grepl("KEGG_",pathways_df$term) | grepl("GOBP_", pathways_df$term),])
pathways_NABA_KEGG_GOBP$mor <- 1
## Matrix
data("meta_network")
prots <- unique(c(meta_network$source, meta_network$target))
prots <- gsub("Gene[0-9]+__","",prots)
prots <- gsub("_reverse","",prots)
prots <- prots[!grepl("Metab",prots)]
prots <- prots[prots %in% pathways_NABA_KEGG_GOBP$gene]
sig_prots <- combined_ATT_reduced$Nodes
sig_prots <- gsub("Gene[0-9]+__","",sig_prots)
sig_prots <- gsub("_reverse","",sig_prots)
sig_prots <- sig_prots[!grepl("Metab",sig_prots)]
sig_prots <- unique(sig_prots)
mat <- matrix(0,nrow = length(prots),1)
row.names(mat) <- prots
mat[row.names(mat) %in% sig_prots, 1] <- 1
mat <- mat[order(mat[,1],decreasing = T), ,drop = F]
## ORA analysis
pathway_ORA_NABA_KEGG_GOBP_all <- as.data.frame(decoupleR::run_ora(mat = mat,
network = pathways_NABA_KEGG_GOBP,
.source = term,
.target = gene,
n_up = length(sig_prots),
n_background = length(prots),
minsize = 0))
## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
After the ORA has been performed, the result is returned as a data frame specifying enriched pathways with a p value and a corresponding score \((-log_{10}(p)\)). We can visualize the results with a ggplot showing the highest scoring pathways.
pathway_ORA_NABA_KEGG_GOBP_all <- pathway_ORA_NABA_KEGG_GOBP_all[order(pathway_ORA_NABA_KEGG_GOBP_all$score, decreasing = T),]
head(pathway_ORA_NABA_KEGG_GOBP_all, n = 25)
## statistic source
## 4543 ora GOBP_POSITIVE_REGULATION_OF_MACROMOLECULE_BIOSYNTHETIC_PROCESS
## 4911 ora GOBP_POSITIVE_REGULATION_OF_TRANSCRIPTION_BY_RNA_POLYMERASE_II
## 4809 ora GOBP_POSITIVE_REGULATION_OF_RNA_METABOLIC_PROCESS
## 3998 ora GOBP_PHOSPHORYLATION
## 6730 ora GOBP_RESPONSE_TO_ENDOGENOUS_STIMULUS
## 3902 ora GOBP_PEPTIDYL_AMINO_ACID_MODIFICATION
## 4609 ora GOBP_POSITIVE_REGULATION_OF_MULTICELLULAR_ORGANISMAL_PROCESS
## 724 ora GOBP_CELLULAR_RESPONSE_TO_ENDOGENOUS_STIMULUS
## 7845 ora KEGG_PATHWAYS_IN_CANCER
## 5544 ora GOBP_REGULATION_OF_CELL_DIFFERENTIATION
## 6758 ora GOBP_RESPONSE_TO_GROWTH_FACTOR
## 247 ora GOBP_APOPTOTIC_PROCESS
## 7415 ora GOBP_TISSUE_DEVELOPMENT
## 6333 ora GOBP_REGULATION_OF_PROTEIN_MODIFICATION_PROCESS
## 5556 ora GOBP_REGULATION_OF_CELL_POPULATION_PROLIFERATION
## 4822 ora GOBP_POSITIVE_REGULATION_OF_SIGNALING
## 1286 ora GOBP_EMBRYO_DEVELOPMENT
## 4392 ora GOBP_POSITIVE_REGULATION_OF_GENE_EXPRESSION
## 1388 ora GOBP_ENZYME_LINKED_RECEPTOR_PROTEIN_SIGNALING_PATHWAY
## 6843 ora GOBP_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND
## 7528 ora GOBP_TUBE_DEVELOPMENT
## 6108 ora GOBP_REGULATION_OF_MULTICELLULAR_ORGANISMAL_DEVELOPMENT
## 940 ora GOBP_CIRCULATORY_SYSTEM_DEVELOPMENT
## 4296 ora GOBP_POSITIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS
## 784 ora GOBP_CELLULAR_RESPONSE_TO_OXYGEN_CONTAINING_COMPOUND
## condition score p_value
## 4543 V1 62.05545 8.801275e-63
## 4911 V1 58.55949 2.757464e-59
## 4809 V1 58.21191 6.138854e-59
## 3998 V1 48.07249 8.462801e-49
## 6730 V1 47.72711 1.874526e-48
## 3902 V1 46.92309 1.193730e-47
## 4609 V1 45.60088 2.506801e-46
## 724 V1 44.27596 5.297073e-45
## 7845 V1 42.52870 2.960046e-43
## 5544 V1 41.74795 1.786714e-42
## 6758 V1 40.29706 5.045859e-41
## 247 V1 39.87006 1.348786e-40
## 7415 V1 39.62636 2.363982e-40
## 6333 V1 38.10984 7.765288e-39
## 5556 V1 37.14723 7.124774e-38
## 4822 V1 36.61819 2.408864e-37
## 1286 V1 36.46002 3.467177e-37
## 4392 V1 34.45955 3.470980e-35
## 1388 V1 34.37269 4.239483e-35
## 6843 V1 34.14930 7.090842e-35
## 7528 V1 33.97833 1.051170e-34
## 6108 V1 33.83996 1.445580e-34
## 940 V1 33.66766 2.149532e-34
## 4296 V1 33.65815 2.197100e-34
## 784 V1 33.33641 4.608775e-34
ggplot(pathway_ORA_NABA_KEGG_GOBP_all[pathway_ORA_NABA_KEGG_GOBP_all$score > 30,]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = -log10(p_value)))
ggplot(pathway_ORA_NABA_KEGG_GOBP_all[grepl("NABA", pathway_ORA_NABA_KEGG_GOBP_all$source),]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = -log10(p_value)))
ggplot(pathway_ORA_NABA_KEGG_GOBP_all[grepl("GOBP", pathway_ORA_NABA_KEGG_GOBP_all$source) & pathway_ORA_NABA_KEGG_GOBP_all$score > 30,]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = -log10(p_value)))
ggplot(pathway_ORA_NABA_KEGG_GOBP_all[grepl("KEGG", pathway_ORA_NABA_KEGG_GOBP_all$source) & pathway_ORA_NABA_KEGG_GOBP_all$score > 15,]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = -log10(p_value)))
Here, for instance, we can see that proteins from pathways in cancer (KEGG) are enriched - in accordance with the fact that we are considering cancer cell lines.
We can further identify and highlight the overlap between nodes present in our network and proteins from the (top) significantly enriched KEGG/NABA/GOBP pathways.
sig_pathways <- pathway_ORA_NABA_KEGG_GOBP_all[pathway_ORA_NABA_KEGG_GOBP_all$score > 0,"source"]
pathways_adjacency <- dcast(pathways_NABA_KEGG_GOBP, gene~term)
## Using mor as value column: use value.var to override.
pathways_adjacency <- pathways_adjacency[pathways_adjacency$gene %in% sig_prots,]
row.names(pathways_adjacency) <- pathways_adjacency$gene
pathways_adjacency <- pathways_adjacency[,sig_pathways]
pathways_adjacency[is.na(pathways_adjacency)] <- 0
pheatmap(t(pathways_adjacency), border_color = T, cluster_rows = F, color = c("lightyellow","red"), breaks = c(0,0.5,1), fontsize = 8, show_rownames = F)
sig_pathways <- pathway_ORA_NABA_KEGG_GOBP_all[pathway_ORA_NABA_KEGG_GOBP_all$score > 30,"source"]
pathways_adjacency <- pathways_adjacency[,sig_pathways]
pheatmap(t(pathways_adjacency), border_color = T, cluster_rows = F, color = c("lightyellow","red"), breaks = c(0,0.5,1), fontsize = 7)
In addition, we can perform an analysis that performs ORA independently for all nodes using the starting node and n-step-distant nodes (in this case 2 steps), enabling us to specifically inspect individual nodes (e.g. SRC) or pathways (e.g. KEGG pathways in cancer).
## Matrix
data("meta_network")
prots <- unique(c(meta_network$source, meta_network$target))
prots <- gsub("Gene[0-9]+__","",prots)
prots <- gsub("_reverse","",prots)
prots <- prots[prots %in% pathways_NABA_KEGG_GOBP$gene]
sig_prots <- combined_SIF_reduced[,c(1,4,2)]
names(sig_prots) <- c("source","interaction","target")
background <- unique(c(sig_prots$source,sig_prots$target))
background_ORA <- gsub("Gene[0-9]+__","", background)
background_ORA <- gsub("_reverse","", background_ORA)
background_ORA <- unique(background_ORA)
pathway_ORA_NABA_KEGG_GOBP_list <- list()
for(node in background){
nodes <- cosmosR:::keep_controllable_neighbours(sig_prots, n_steps = 2, input_nodes = node) #keeps the nodes in network that are no more then n_steps away from the starting nodes
nodes <- unique(c(nodes$source,nodes$target))
nodes <- gsub("Gene[0-9]+__","",nodes)
nodes <- gsub("_reverse","",nodes)
nodes <- nodes[nodes %in% pathways_NABA_KEGG_GOBP$gene]
if (length(nodes) > 0) {
mat <- matrix(0,nrow = length(background_ORA),1)
row.names(mat) <- background_ORA
mat[row.names(mat) %in% nodes, 1] <- 1
mat <- mat[order(mat[,1],decreasing = T), ,drop = F]
pathways_NABA_KEGG_GOBP_filtered <- pathways_NABA_KEGG_GOBP %>%
group_by(term) %>%
filter(n_distinct(pathways_NABA_KEGG_GOBP[gene %in% nodes,])>0) %>%
as.data.frame()
pathway_ORA_NABA_KEGG_GOBP <- as.data.frame(decoupleR::run_ora(mat = mat,
network = pathways_NABA_KEGG_GOBP_filtered,
.source = term,
.target = gene,
n_up = length(nodes),
n_background = length(prots),
minsize = 0))
pathway_ORA_NABA_KEGG_GOBP <- pathway_ORA_NABA_KEGG_GOBP[,c(4,2)]
names(pathway_ORA_NABA_KEGG_GOBP) <- c(node,"pathway")
pathway_ORA_NABA_KEGG_GOBP_list[[node]] <- pathway_ORA_NABA_KEGG_GOBP
} else {
pathway_ORA_NABA_KEGG_GOBP <- data.frame(rep(0, length(unique(pathways_NABA_KEGG_GOBP[pathways_NABA_KEGG_GOBP$gene %in% background==T,2]))),
unique(pathways_NABA_KEGG_GOBP[pathways_NABA_KEGG_GOBP$gene %in% background==T,2]))
names(pathway_ORA_NABA_KEGG_GOBP) <- c(node,"pathway")
pathway_ORA_NABA_KEGG_GOBP_list[[node]] <- pathway_ORA_NABA_KEGG_GOBP
}
}
save(pathway_ORA_NABA_KEGG_GOBP_list, file = "results/pathway_enrichment/pathway_ORA_NABA_KEGG_GOBP_list.RData")
load("results/pathway_enrichment/pathway_ORA_NABA_KEGG_GOBP_list.RData")
pathway_ORA_NABA_KEGG_GOBP_combined <- data.frame(Reduce(function(x,y) merge(x,y,all.x = T, all.y = T), pathway_ORA_NABA_KEGG_GOBP_list))
row.names(pathway_ORA_NABA_KEGG_GOBP_combined) <- pathway_ORA_NABA_KEGG_GOBP_combined$pathway
pathway_ORA_NABA_KEGG_GOBP_combined <- pathway_ORA_NABA_KEGG_GOBP_combined[,-1]
pathway_ORA_NABA_KEGG_GOBP_combined[is.na(pathway_ORA_NABA_KEGG_GOBP_combined)] <- 0
pheatmap(pathway_ORA_NABA_KEGG_GOBP_combined, fontsize = 6, cluster_rows = T, show_rownames = F)
Since cancer cell lines are considered here, we can specifically look at known over-activated cancer pathways.
cancer_control <- pathway_ORA_NABA_KEGG_GOBP_combined["KEGG_PATHWAYS_IN_CANCER",]
cancer_control <- t(cancer_control[,order(cancer_control[1,], decreasing = F)])
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
ggplot(cancer_control)+geom_point(aes(reorder(rownames(cancer_control), KEGG_PATHWAYS_IN_CANCER, increasing = T), KEGG_PATHWAYS_IN_CANCER))+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Also specific nodes (e.g. SRC) can be inspected to visualize certain pathway involvement.
SRC_pathway <- pathway_ORA_NABA_KEGG_GOBP_list[["SRC"]][order(pathway_ORA_NABA_KEGG_GOBP_list[["SRC"]]$SRC, decreasing = T),][pathway_ORA_NABA_KEGG_GOBP_list[["SRC"]][order(pathway_ORA_NABA_KEGG_GOBP_list[["SRC"]]$SRC, decreasing = T),]$SRC,]
ggplot(SRC_pathway)+geom_point(aes(reorder(pathway, SRC, increasing = T), SRC))+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
HRAS_pathway <- pathway_ORA_NABA_KEGG_GOBP_list[["HRAS"]][order(pathway_ORA_NABA_KEGG_GOBP_list[["HRAS"]]$HRAS, decreasing = T),][pathway_ORA_NABA_KEGG_GOBP_list[["HRAS"]][order(pathway_ORA_NABA_KEGG_GOBP_list[["HRAS"]]$HRAS, decreasing = T),]$HRAS,]
ggplot(HRAS_pathway)+geom_point(aes(reorder(pathway, HRAS, increasing = T), HRAS))+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
To compare the ORA result of the final network, we can also perform principal component gene set analysis (PCGSA) on the MOFA factors containing the protein weights as well as the RNA weights. The result can then be inspected as previously. Further information can be found here: MOFA2 GSEA.
pathways_factor <- pathways_NABA_KEGG_GOBP %>%
group_by(term) %>%
pivot_wider(
names_from = gene,
values_from = mor) %>%
as.data.frame()
rownames(pathways_factor) <- pathways_factor$term
pathways_factor <- pathways_factor[,-1]
pathways_factor[is.na(pathways_factor)] <- 0
# Proteomics
model <- load_model('results/mofa/mofa_res_10factor.hdf5')
## Warning in load_model("results/mofa/mofa_res_10factor.hdf5"): There are duplicated features names across different views. We will add the suffix *_view* only for those features
## Example: if you have both TP53 in mRNA and mutation data it will be renamed to TP53_mRNA, TP53_mutation
## Warning in .quality_control(object, verbose = verbose): Factor(s) 1 are strongly correlated with the total number of expressed features for at least one of your omics. Such factors appear when there are differences in the total 'levels' between your samples, *sometimes* because of poor normalisation in the preprocessing steps.
model_PCGSA <- model
features_names(model_PCGSA)$proteo <- gsub("_proteo","",features_names(model_PCGSA)$proteo)
factor_GSEA_prot <- MOFA2::run_enrichment(model_PCGSA, "proteo", as.matrix(pathways_factor), min.size = 0, factors = "all", sign = "positive", p.adj.method = "fdr") #here as an example positive weights: features with positive weights are “high” in the samples with positive factor values
## Intersecting features names in the model and the gene set annotation results in a total of 1757 features.
##
## Running feature set Enrichment Analysis with the following options...
## View: proteo
## Number of feature sets: 7959
## Set statistic: mean.diff
## Statistical test: parametric
## Subsetting weights with positive sign
##
factor_GSEA_prot$pval.adj[is.na(factor_GSEA_prot$pval.adj)] <- 1
factor_GSEA_prot$set.statistics[is.na(factor_GSEA_prot$set.statistics)] <- 0
plot_enrichment_heatmap(factor_GSEA_prot)
grid.arrange(plot_enrichment(factor_GSEA_prot, factor=1) + ggtitle("Factor1"),
plot_enrichment(factor_GSEA_prot, factor=2) + ggtitle("Factor2"),
plot_enrichment(factor_GSEA_prot, factor=3) + ggtitle("Factor3"),
plot_enrichment(factor_GSEA_prot, factor=4) + ggtitle("Factor4"),
plot_enrichment(factor_GSEA_prot, factor=5) + ggtitle("Factor5"))
grid.arrange(plot_enrichment(factor_GSEA_prot, factor=6) + ggtitle("Factor6"),
plot_enrichment(factor_GSEA_prot, factor=7) + ggtitle("Factor7"),
plot_enrichment(factor_GSEA_prot, factor=8) + ggtitle("Factor8"),
plot_enrichment(factor_GSEA_prot, factor=9) + ggtitle("Factor9"))
plot_enrichment_detailed(factor_GSEA_prot, factor = 4, max.genes = 10)
## Warning: ggrepel: 17 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
factor_GSEA_prot_df <- data.frame(rownames(factor_GSEA_prot[["set.statistics"]]), factor_GSEA_prot[["set.statistics"]][,4], factor_GSEA_prot$pval.adj[,4])
colnames(factor_GSEA_prot_df) <- c("source", "score", "p_value")
ggplot(factor_GSEA_prot_df[factor_GSEA_prot_df$score > 4,]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = p_value)) + coord_flip() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
#Transcriptomics
features_names(model_PCGSA)$RNA <- gsub("_RNA","",features_names(model_PCGSA)$RNA)
factor_GSEA_RNA <- MOFA2::run_enrichment(model_PCGSA, "RNA", as.matrix(pathways_factor), min.size = 0, factors = "all", sign = "positive", p.adj.method = "fdr")
## Intersecting features names in the model and the gene set annotation results in a total of 5133 features.
##
## Running feature set Enrichment Analysis with the following options...
## View: RNA
## Number of feature sets: 7959
## Set statistic: mean.diff
## Statistical test: parametric
## Subsetting weights with positive sign
##
factor_GSEA_RNA$pval.adj[is.na(factor_GSEA_RNA$pval.adj)] <- 1
factor_GSEA_RNA$set.statistics[is.na(factor_GSEA_RNA$set.statistics)] <- 0
plot_enrichment_heatmap(factor_GSEA_RNA)
grid.arrange(plot_enrichment(factor_GSEA_RNA, factor=1) + ggtitle("Factor1"),
plot_enrichment(factor_GSEA_RNA, factor=2) + ggtitle("Factor2"),
plot_enrichment(factor_GSEA_RNA, factor=3) + ggtitle("Factor3"),
plot_enrichment(factor_GSEA_RNA, factor=4) + ggtitle("Factor4"),
plot_enrichment(factor_GSEA_RNA, factor=5) + ggtitle("Factor5"))
grid.arrange(plot_enrichment(factor_GSEA_RNA, factor=6) + ggtitle("Factor6"),
plot_enrichment(factor_GSEA_RNA, factor=7) + ggtitle("Factor7"),
plot_enrichment(factor_GSEA_RNA, factor=8) + ggtitle("Factor8"),
plot_enrichment(factor_GSEA_RNA, factor=9) + ggtitle("Factor9"))
plot_enrichment_detailed(factor_GSEA_RNA, factor = 4)
factor_GSEA_RNA_df <- data.frame(rownames(factor_GSEA_RNA[["set.statistics"]]), factor_GSEA_RNA[["set.statistics"]][,4], factor_GSEA_RNA$pval.adj[,4])
colnames(factor_GSEA_RNA_df) <- c("source", "score", "p_value")
ggplot(factor_GSEA_RNA_df[factor_GSEA_RNA_df$score > 7.5,]) +
geom_point(aes(score, reorder(source, score, increasing = T), color = p_value)) + coord_flip() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Interestingly, it becomes evident in the protein view that although Factor 4 explains not the most of the variance in the proteomics data, it explains the variance of the proteins in the ECM-related pathways (NABA pathways) important for cell reorganization in cancer. Moreover, since RNA transcripts are intermediates to proteins, transcriptomics data can potentially lead to identify over-activated pathways.
Further downstream analyses of the network model (e.g. using CytoScape) and the MOFA model (e.g. further comparisons before and after COSMOS) can be performed. In order to perform more targeted analyses related to GSA/GSEA and thus reduce the complexity, it is recommended to identify and extract pathways of interest prior to ORA/PCGSA.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RCy3_2.16.0 tidyr_1.2.1 GSEABase_1.58.0
## [4] graph_1.74.0 annotate_1.74.0 XML_3.99-0.10
## [7] AnnotationDbi_1.58.0 IRanges_2.30.1 S4Vectors_0.34.0
## [10] Biobase_2.56.0 BiocGenerics_0.42.0 gridExtra_2.3
## [13] pheatmap_1.0.12 moon_0.1.0 decoupleR_2.3.2
## [16] liana_0.1.6 reshape2_1.4.4 dplyr_1.0.10
## [19] ggfortify_0.4.14 ggplot2_3.3.6 readr_2.1.2
## [22] MOFA2_1.6.0 cosmosR_1.5.1
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.26
## [3] tidyselect_1.1.2 RSQLite_2.2.17
## [5] grid_4.2.1 BiocParallel_1.30.3
## [7] Rtsne_0.16 base64url_1.4
## [9] munsell_0.5.0 ScaledMatrix_1.4.1
## [11] codetools_0.2-18 statmod_1.4.37
## [13] scran_1.24.1 pbdZMQ_0.3-8
## [15] future_1.28.0 withr_2.5.0
## [17] colorspace_2.0-3 progressr_0.11.0
## [19] filelock_1.0.2 highr_0.9
## [21] logger_0.2.2 knitr_1.40
## [23] uuid_1.1-0 rstudioapi_0.14
## [25] SingleCellExperiment_1.18.0 listenv_0.8.0
## [27] labeling_0.4.2 MatrixGenerics_1.8.1
## [29] repr_1.1.4 GenomeInfoDbData_1.2.8
## [31] farver_2.1.1 bit64_4.0.5
## [33] rhdf5_2.40.0 basilisk_1.8.1
## [35] parallelly_1.32.1 vctrs_0.4.1
## [37] generics_0.1.3 xfun_0.33
## [39] R6_2.5.1 doParallel_1.0.17
## [41] GenomeInfoDb_1.32.4 clue_0.3-61
## [43] rsvd_1.0.5 RJSONIO_1.3-1.6
## [45] locfit_1.5-9.6 bitops_1.0-7
## [47] rhdf5filters_1.8.0 cachem_1.0.6
## [49] DelayedArray_0.22.0 assertthat_0.2.1
## [51] vroom_1.5.7 scales_1.2.1
## [53] rgeos_0.5-9 gtable_0.3.1
## [55] beachmat_2.12.0 globals_0.16.1
## [57] rlang_1.0.6 GlobalOptions_0.1.2
## [59] broom_1.0.1 checkmate_2.1.0
## [61] yaml_2.3.5 backports_1.4.1
## [63] tools_4.2.1 ellipsis_0.3.2
## [65] jquerylib_0.1.4 RColorBrewer_1.1-3
## [67] Rcpp_1.0.9 plyr_1.8.7
## [69] base64enc_0.1-3 sparseMatrixStats_1.8.0
## [71] progress_1.2.2 zlibbioc_1.42.0
## [73] purrr_0.3.4 RCurl_1.98-1.8
## [75] basilisk.utils_1.8.0 prettyunits_1.1.1
## [77] GetoptLong_1.0.5 cowplot_1.1.1
## [79] SeuratObject_4.1.2 SummarizedExperiment_1.26.1
## [81] ggrepel_0.9.1 cluster_2.1.4
## [83] fs_1.5.2 magrittr_2.0.3
## [85] circlize_0.4.15 matrixStats_0.62.0
## [87] hms_1.1.2 evaluate_0.16
## [89] xtable_1.8-4 readxl_1.4.1
## [91] shape_1.4.6 compiler_4.2.1
## [93] tibble_3.1.8 crayon_1.5.1
## [95] htmltools_0.5.3 later_1.3.0
## [97] tzdb_0.3.0 DBI_1.1.3
## [99] corrplot_0.92 ComplexHeatmap_2.12.1
## [101] rappdirs_0.3.3 Matrix_1.5-1
## [103] cli_3.4.1 uchardet_1.1.1
## [105] parallel_4.2.1 metapod_1.4.0
## [107] igraph_1.3.5 GenomicRanges_1.48.0
## [109] forcats_0.5.2 pkgconfig_2.0.3
## [111] dir.expiry_1.4.0 OmnipathR_3.5.6
## [113] sp_1.5-0 IRdisplay_1.1
## [115] scuttle_1.6.3 xml2_1.3.3
## [117] foreach_1.5.2 bslib_0.4.0
## [119] dqrng_0.3.0 XVector_0.36.0
## [121] rvest_1.0.3 stringr_1.4.1
## [123] digest_0.6.29 Biostrings_2.64.1
## [125] rmarkdown_2.16 cellranger_1.1.0
## [127] uwot_0.1.14 edgeR_3.38.4
## [129] DelayedMatrixStats_1.18.1 curl_4.3.2
## [131] rjson_0.2.21 lifecycle_1.0.2
## [133] jsonlite_1.8.0 Rhdf5lib_1.18.2
## [135] BiocNeighbors_1.14.0 limma_3.52.4
## [137] fansi_1.0.3 pillar_1.8.1
## [139] lattice_0.20-45 KEGGREST_1.36.3
## [141] fastmap_1.1.0 httr_1.4.4
## [143] glue_1.6.2 png_0.1-7
## [145] iterators_1.0.14 bluster_1.6.0
## [147] bit_4.0.4 stringi_1.7.8
## [149] sass_0.4.2 HDF5Array_1.24.2
## [151] blob_1.2.3 BiocSingular_1.12.0
## [153] memoise_2.0.1 IRkernel_1.3.1
## [155] irlba_2.3.5 future.apply_1.9.1